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ABSTRACT 

Powerful constraints on theories can already be inferred from existing CMB anisotropy data. 
But performing an exact analysis of available data is a complicated task and may become pro¬ 
hibitively so for upcoming experiments with > 10 4 pixels. We present a method for approximating 
the likelihood that takes power spectrum constraints, e.g., “band-powers”, as inputs. We identify 
a bias which results if one approximates the probability distribution of the band-power errors as 
Gaussian—as is the usual practice. This bias can be eliminated by using specific approximations 
to the non-Gaussian form for the distribution specified by three parameters (the maximum likeli¬ 
hood or mode, curvature or variance, and a third quantity). We advocate the calculation of this 
third quantity by experimenters, to be presented along with the maximum-likelihood band-power 
and variance. We use this non-Gaussian form to estimate the power spectrum of the CMB in 
eleven bands from multipole moment 1 = 2 (the quadrupole) to I = 3000 from all published 
band-power data. We investigate the robustness of our power spectrum estimate to changes in 
these approximations as well as to selective editing of the data. 

Subject headings: cosmic microwave background — cosmology: miscellaneous — methods: data analysis 
— methods: statistical 


1. Introduction 

Measurement of the anisotropy of the Cosmic 
Microwave Background (CMB) is proving to be 
a powerful cosmological probe. Proper statistical 
treatment of the data—likelihood calculation—is 
complicated and time-consuming, and promises to 
become prohibitively so in the very near future. 
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Here, we introduce approximations for this likeli¬ 
hood calculation which allow simple and accurate 
evaluation after the direct estimation of the power 
spectrum (Cf) from the data. 

Although it is possible to produce constraints 
on cosmological parameters directly from the data, 
using the power spectrum as an intermediate step 
(e.g. Tegmark (1997)) has several advantages. 
The near-degeneracy of some combinations of cos¬ 
mological parameters (e.g., Bond, Efstathiou & 
Tegmark (1997)) implies the surfaces of constant 
likelihood in cosmological parameter space are 
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highly elongated, making it difficult for search 
algorithms to navigate (Oh, Spergel & Hinshaw 
1998). The power spectrum space is much simpler 
than the cosmological parameter space since each 
multipole moment (or band of multipole moments) 
is usually only weakly dependent on the others, al¬ 
leviating the search difficulties. Although one still 
has the problem left of estimating nearly degen¬ 
erate cosmological parameters from the resulting 
power spectrum constraints, the likelihood given 
the power spectrum constraints is much easier to 
compute than the likelihood given the map data. 

Proceeding via the power spectrum also facil¬ 
itates the calculation of constraints from multi¬ 
ple datasets. Without this intermediate step, a 
joint analysis may often be prohibitively com¬ 
plicated. Aspects particular to each experiment 
(e.g., offset removals, non-trivial chopping strate¬ 
gies) make implementation of the analysis suffi¬ 
ciently laborious that no one has jointly analyzed 
more than a handful of datasets in this man¬ 
ner. Reducing each dataset to a set of constraints 
on the power spectrum can serve as a form of 
data compression which simplifies further analy¬ 
sis. Indeed, most studies of cosmological param¬ 
eter constraints from all, or nearly all, of the re¬ 
cent data have used, as their starting points, pub¬ 
lished power spectrum constraints, [(Lineweaver 
1997,1998a,b,c; Lineweaver & Barbosa 1998; Han¬ 
cock et al. 1998); see also Tegmark (1998) and Ef- 
stathiou et al. (1998) for joint analyses with other 
datasets]. Since the power spectrum constraints 
are usually described with orders of magnitude 
fewer numbers than the pixelized data, we refer 
to this compression as “radical”. 

Are there any disadvantages to proceeding via 
the power spectrum? To answer this question, let 
us consider the analysis procedure. Most anal¬ 
yses of CMB datasets have assumed the noise 
and signal to be Gaussian random variables, and 
to date there is no strong evidence to the con¬ 
trary (although for a different view, see Ferreira, 
Magueijo & Gorski (1998)). The simplicity of this 
model of the data allows for an exact Bayesian 
analysis, which has been performed for almost 
all datasets individually. The procedure is con¬ 
ceptually straightforward: maximize the proba¬ 
bility P (parameters | data) over the allowed pa¬ 
rameter space. Most often, we take the prior 
probability for the parameters to be constant, so 


this is equivalent to maximizing the likelihood, 
P(data|parameters). Because we have assumed 
the noise and signal to be Gaussian, this latter 
is just a multivariate Gaussian in the data ; the 
theoretical parameters enter into the covariance 
matrix. 

Fortunately, if the theoretical signal is indeed 
normally distributed and in addition the sig¬ 
nals are statistically isotropic, the power spec¬ 
trum encodes all of the information about the 
model, and all of the constraints on the parame¬ 
ters of the theory can be obtained from the Ci 
probability distribution: i.e., the likelihood as 
a function of some (cosmological) parameters, 
ai, is just the likelihood as a function of the 
power spectrum determined from those param¬ 
eters: P(data|ai) = P(data|Q[ai]). Thus the 
constraints on the power spectrum may serve as 
our “compressed dataset”. (If the theory is not 
isotropic, as may occur for nontrivial topologies 
(e.g., Bond, Pogosyan & Souradeep (1998)), or is 
non-Gaussian, then the analysis must go beyond 
the isotropic power spectrum.) 

A problem arises though due to the fact that 
the uncertainties in the power spectrum determi¬ 
nation are not Gaussian-distributed. Thus if we 
compress the power spectrum probability distri¬ 
bution to a mean (or mode—the location of the 
posterior maximum) and a variance, we lose the in¬ 
formation contained in the higher order moments. 
One might be tempted to rely on the central limit 
theorem and hope that the posterior for the power 
spectrum is sufficiently close to a Gaussian that 
a simple y 2 procedure will suffice. This is what 
has been done in recent joint analyses of current 
CMB data (e.g., Lineweaver (1997); Lineweaver 
(1998a); Lineweaver (1998b); Lineweaver (1998c); 
Lineweaver & Barbosa (1998); Hancock et al. 
(1998)). and what has been advocated for the 
analysis of satellite data (e.g., Tegmark (1997)). 

Not only is information discarded with this pro¬ 
cedure, however—which one might think of as 
merely increasing the final error bars—but neglect 
of this effect leads to a bias (Jaffe, Knox & Bond 
1998; Bond, Jaffe & Knox 1998; Seljak 1997; Oh, 
Spergel & Hinshaw 1998). Here we show that the 
information loss and its effects, such as this bias, 
can be greatly reduced by assuming the posterior 
distribution to have a specific non-Gaussian form 
parameterized by the likelihood maximum, the co- 
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variance matrix and a third quantity which mea¬ 
sures the noise contribution to the uncertainty of 
each measured amplitude. 

A relatively fast algorithm for determining 
the power spectrum or other parameters is the 
“quadratic estimator” (Tegmark 1997; Bond, Jaffe 
& Knox 1998; Oh, Spergel & Hinshaw 1998), al¬ 
though it still requires 0{n p ix ) operations, where 
?ipi x is the number of pixels in the dataset. Our 
view of quadratic estimation is that, used itera¬ 
tively, it is a particular method for finding the 
maximum of the likelihood and the parameter co- 
variance matrix (Bond, Jaffe & Knox 1998; Oh, 
Spergel & Hinshaw 1998). We emphasize that the 
information loss associated with compression to 
a mode and covariance matrix has nothing to do 
with how that mode and covariance matrix are 
calculated. Other methods of likelihood analysis 
will, of course, suffer the same problems when the 
constraints are reduced to these two sets of quan¬ 
tities. In fact, the quadratic estimation algorithm 
has the advantage that an implementation of it 
as a computer code can be used (with very minor 
changes) to calculate the new noise contribution 
quantity. 

In Section 2, we describe the problems gen¬ 
erated by the non-Gaussianity of the likelihood 
function, and propose solutions which allow rapid 
and simple calculation of cosmological likelihoods 
for CMB data at the price of calculating only 
this single new parameter at each £ (or band). 
Bartlett et al. (1999) have used an ansatz, essen¬ 
tially equivalent to the equal-variance approxima¬ 
tion of Sec. 2.2.2 to arrive at similar results.) In 
Section 3, we test this method via application to 
COBE/DMR data. In Section 4, we extend the 
formalism to more complicated chopping experi¬ 
ments and to the measurement of other amplitude 
parameters such as bandpowers. We apply these 
extensions to the Saskatoon data in Section 5, to 
OVRO, SP and SuZIE in Section 6, and Saskatoon 
combined with COBE/DMR, in Section 7. 

We then apply our procedure to the more 
ambitious task of fitting an eleven parame¬ 
ter model to a compendium of all CMB re¬ 
sults to date in Section 8. Previous explo¬ 
rations of parameter space have been limited 
to much lower dimensionality and have assumed 
Gaussianity (e.g., Lineweaver (1997); Lineweaver 
(1998a); Lineweaver (1998b); Lineweaver (1998c); 


Lineweaver & Barbosa (1998); Hancock et al. 
(1998)). The parameters are the power in eleven 
bins from l = 2 to l = 3000. We study the 
robustness of the resulting maximum likelihood 
power spectrum to assumptions about the noise 
contribution to the error, different binnings and 
selective editings of the data. The binned power 
spectrum, fit to the band-power data, provides an 
excellent tool for visualizing the combined power 
spectrum constraints from all the data. Such a 
figure should replace the usual one of all the band 
powers, which is much harder to interpret. 

Finally, in Section 9 we discuss the results and 
conclude, ending with an exhortation to the com¬ 
munity to calculate and provide the appropriate 
quantities for all future experiments. 

Throughout, we will use Ce to refer to the usual 
CMB power spectrum, and define 


C t = 


1 ( 1 + 1 ) 

2-7T 


C t . 


(1) 


2. Non-Gaussianity of the Likelihood Func¬ 
tion 


2.1. The Problem: Cosmic Bias 

To illustrate the problem, consider an unrealis¬ 
tic “experiment” covering the whole sky with no 
noise. In this case, the data at pixel p, A p is just 
the actual sky signal, s p , and the correlation ma¬ 
trix, S pp >, is just the correlation function c(d pp >). 
Even in this case, the observed sky is just one real¬ 
ization of the underlying power spectrum. To de¬ 
termine these Ci = £(£ + 1)Ci/(2tt), we still must 
resort to the likelihood function. In this case, 

-2InP(A|C/) = lndet S(Ci) + A^'S' _1 A 

= £(2*+1) (in Ci+Ct/C^p) 

i 


up to an irrelevant additive constant. In the sec¬ 
ond line, we define the observed power spectrum 
of this realization as 


Ct 


*(*+ 1 ) 

27T 


1 

2 £ + 1 


53k™i 2 > 


( 3 ) 


where the ai rn are the spherical harmonic coeffi¬ 
cients of the (noise-free) observed sky. 

A Gaussian distribution has the following prop¬ 
erties: it is completely specified by its mean and 
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covariance matrix (higher moments of the distri¬ 
bution can be derived from these); the covariance 
matrix is given by the inverse of the curvature ma¬ 
trix (defined as Tw = — d 2 In P(A\Ci)/dCidCi'); 
and the curvature matrix is independent of Cg. 
None of these properties hold for the distribution 
in Eq. 2, which is non-Gaussian. 1 For a non- 
Gaussian distribution, it is therefore often useful 
to consider the ensemble average curvature, also 
known as the Fisher Matrix, F = {T). 

What happens if, despite the non-Gaussianity 
of the distribution in Eq. 2, one identifies the co- 
variance matrix with the inverse of the curvature 
matrix and then ignores higher order moments? 
The first step would be calculation of the curva¬ 
ture matrix: 


d 2 In P(A\Ce) 
dC e dC e 


(2 C t /Cf - 1/C, 2 ) 6u>. 


( 4 ) 

Note that, unlike for a Gaussian distribution, the 
curvature depends on C,. The natural remedy is to 
evaluate it at the peak of the likelihood, C,. The 
standard error (square root of the variance) is then 
given by 5Cg = 2/(21 + 1)C,. Note though that, 

uncertainties derived in this manner are larger if C, 
has fluctuated upward from the underlying “real” 
value and smaller for a downward fluctuation. If, 
in addition, we ignore higher order moments of the 
distribution, then upward fluctuations are given 
less weight than downward fluctuations, resulting 
in a downward bias for the overall power spec¬ 
trum amplitude. It is generally the lowest multi¬ 
pole moments constrained by an observation that 
have the most non-Gaussian distributions. As has 
been seen (Bunn & White 1997; Bond, Jaffe & 
Knox 1998) and will be seen again below, this may 
contribute to some of the confusion in the commu¬ 
nity regarding the so-called anomalous value of the 
COBE quadrupole. 


In the presence of noisy data over partial ar¬ 
eas of the sky, the likelihood is no longer so 
simple, and must be laboriously calculated (e.g., 
Bond (1994);Bunn & White (1997);Bond & Jaffe 
(1998a);Bond & Jaffe (1998b)). In Fig. 1 we show 
the actual likelihood for the COBE quadrupole 
and other multipoles, along with the Gaussian 


1 The posterior distribution of Cp is not X 21+1 either. It 
is the realization, Cp, that is X 2 t+l-distributed for a fixed 
“underlying” power spectrum, Cp. 


that would be assumed given the curvature ma¬ 
trix calculated from the data. The figures show 
another way of understanding the bias introduced 
by assuming Gaussianity: upward deviations from 
the mean (which is not actually the mean of 
the non-Gaussian distribution, but the mode) are 
overly disfavored by the Gaussian distributions 
while downward ones are overly probable. For ex¬ 
ample, the standard-CDM value of C 2 = 770/xK 2 
is only 0.2 times less likely than the most likely 
value of 150/iK 2 but it seems like a 5-sigma ex¬ 
cursion (4 x 1CU 6 times less likely) based on the 
curvature alone. 

Although it is extremely pronounced in the case 
of the quadrupole this is a problem that plagues 
all CMB data: the actual distribution is skewed 
to allow larger positive excursions than negative. 
The full likelihood “knows” about this and in fact 
takes it into account; however, if we compress 
the data to observed C, ± <r, (or even observed 
Ci and a correlation matrix M,,/) we lose this in¬ 
formation about the shape of the likelihood func¬ 
tion. Because of its relation to the well-known 
phenomenon of cosmic variance, we choose to call 
this problem one of cosmic bias. 

We emphasize that cosmic bias can be impor¬ 
tant even in high-S/N experiments with many pix¬ 
els. We might expect the central limit theorem 
to hold in this case and the distributions to be¬ 
come Gaussian. Indeed they do, at least near the 
peak. However, the central limit theorem does not 
guarantee that the tails of the distribution will ap¬ 
proach those of a Gaussian as rapidly as does the 
region near the peak and there is the danger that 
a few seemingly discrepant points are given con¬ 
siderably more weight than they deserve. Cosmic 
bias has also been noted in previous work (Bond, 
Jaffe & Knox 1998; Seljak 1997; Oh, Spergel & 
Hinshaw 1998). 

Putting the problem a bit more formally, we see 
that even in the limit of infinite signal-to-noise we 
cannot use a simple % 2 test on C, estimates; such 
a test implicitly assumes a Gaussian likelihood. 
Unlike the distribution discussed here, a Gaussian 
would have constant curvature (SCe = constant), 
rather than 5Ce oc Ci as illustrated here. 

We emphasize that the problem as outlined here 
is easily solved in principle: just calculate using 
the full likelihood function. Unfortunately, this is 
more easily said than done—direct calculation of 
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Fig. 1. DMR Likelihoods P(A\Ce) for various 
values of £, as marked. The horizontal axis is Cg = 
£(£ + l)Q/(27r). The upper right panel gives the 
cumulative probability. The solid (black) line is 
the full likelihood calculated exactly. The dashed 
(red) line is the Gaussian approximation about the 
peak. 


the likelihood function takes 0(rip ) operations per 
parameter-space point. For the n v > 10 5 datasets 
already coming, this is prohibitively expensive. In¬ 
deed, it is not even clear how to perform the op¬ 
erations necessary even to find the likelihood peak 
and variance in a reasonable time (Bond, Jaffe & 
Knox 1998; Oh, Spergel & Hinshaw 1998). It is 
likely that other forms of data compression and/or 
new algorithms will be necessary even at this stage 
of the analysis. Signal-to-noise Eigenmodes, dis¬ 
cussed in Appendix A, have been suggested as a 
useful compression tool (Tegmark, Taylor & Heav¬ 
ens 1997; Jaffe, Knox & Bond 1998) and used in 
some analyses (Bond & Jaffe 1998a,b; Bunn & 
Sugiyama 1995; Bunn & White 1997). 

Instead, we must find efficient ways to approx¬ 
imate the likelihood function based on minimal 
information. In the rest of this paper, we discuss 
two approximations, each motivated by different 
aspects of our knowledge of the likelihood func¬ 
tion. Each requires only knowledge of the like¬ 
lihood peak and curvature (or variance) as well 
as a third quantity related to the noise proper¬ 
ties of the experiment. Alternately, for already- 
calculated likelihood functions, each approxima¬ 
tion gives a functional form for fitting with a small 
number of parameters. 


2.2. The solution: approximating the like¬ 
lihood 

2.2.1. Offset lognormal distribution 

We already know enough about the likelihood 
to see a solution to this problem. For a given 
multipole £, there are two distinct regimes of like¬ 
lihood. Add uniform pixel noise and a finite 
beam to the simple all-sky “experiment” consid¬ 
ered above. Now, the likelihood has contributions 
from the signal, ae m , and the noise, ngm (after 
transforming again to spherical harmonics). 


-21nP(A|C*) = 


t 


In (CgB 2 -{- A//) F 


Th 


CtBj 


■ A fe 


(as usual up to an irrelevant additive constant), 
with A ft = £(£ + l)N(/(2n), where Ng — {\ng m \ 2 ) 
is the noise power spectrum in spherical harmon¬ 
ics, and Vg = [£(£ +1)/(27t)] m Wm\ 2 /(2£ + 1) 
is the power spectrum of the full data (noise plus 
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beam-smoothed signal); we have written it as a 
different symbol from above to emphasize the in¬ 
clusion of noise and again use script lettering to 
refer to quantities multiplied by £(£ + l)/(2n). 

Now, the likelihood is maximized at Ct = (T>t — 
Aft)/B 2 and the curvature about this maximum is 
given by 


r (Q dHnP(A\C e ) 2i+l 2 

*W = - xn.xn.. - “ ) d W 


dCfdCfj 


( 6 ) 


so the error (defined by the variance) on a Ct is 


dCt — (Ct + Afz/Bf} / y/£ + 1/2. (7) 


Note that in this expression there is once again 
indication of a bias if we assume Gaussianity: 
upward fluctuations have larger uncertainty than 
downward fluctuations. But this is not true for 
Z( where Zt is defined so that 5Zt ot 5Ct/(Ct + 
Ni/Bj). More precisely, Zt = \n(Ct + Aft/Bf). 
Since SZt is proportional to a constant, our ap¬ 
proximation to the likelihood is to take Zt as nor¬ 
mally distributed. That is, we approximate 


is not a function of Z. We immediately know of 
one such transformation which would seem to do 
the trick: 


dZ L _ _i/2 
dC e ~ Ll 


( 12 ) 


where the 1/2 indicates a Cholesky decomposition 
or Hermitian square root. In general, this will be 
a horrendously overdetermined set of equations, 
N 2 equations in N unknowns. However, we can 
solve this equation in general if we take the cur¬ 
vature matrix to be given everywhere by the di¬ 
agonal form for the simplified experiment we have 
been discussing (Eq. 6). In this case, the equa¬ 
tions decouple and lose their dependence on the 
data, becoming (up to a constant factor) 


dZt 

dCt 


(Ct + Aft/Bf) 


(13) 


The solution to this differential equation is just 
what we expected, 


Zt — In (Ct +Jft/B/) (14) 


with correlation matrix 


-2 lnP(A\C e ) = ^ Z t M$Z v (8) 

w 

(up to a constant) where Mffi = (Ct+xt)M^) (Cf + 
Xf) where M^ is the weight matrix (covariance 
matrix inverse) of the Ct, usually taken to be the 
curvature matrix. We refer to Eq. 8 as the off¬ 
set lognormal distribution of Ct- Somewhat more 
generally we write 

Zt = In (Ct + xt) (9) 


for some constant xt, which for the case at hand 
is xt = Afe/B f. 

It is illustrative to derive the quantity Zt in a 
somewhat more abstract fashion. We wish to find 
a change of variables from Ct to Zt such that the 
curvature matrix is a constant: 


u - r w 

dZt 


= 0. 


( 10 ) 


That is, we want to find a change of variables such 
that 



® z l ( T (c)\ 1 dZu 
^ dCt V Ju> dCf 


( 11 ) 


T<~ z )) = _ V ’LL' _ 

V )u> (C L +Af L /B 2 L ) (C L . +Af L '/Bl,) 

(i5) 

where £ = L and £' = L'. Please note that we 
are calculating a constant correlation matrix; the 
Ct in the denominator of this expression should 
be taken at the peak of the likelihood (i.e., the 
estimated quantities). 

We emphasize that, even for an all-sky con¬ 
tinuously and uniformly sampled experiment (for 
which Eq. 6 is exact), this Gaussian form, Eq. 8, 
is only an approximation, since the curvature ma¬ 
trix is given by Eq. 6 only at the peak. Nonetheless 
we expect it to be a better approximation than a 
naive Gaussian in Ct (which we note is the limit 
xt —> oo of the offset lognormal). 

We also note that there is another approxima¬ 
tion involved in this expression: by choosing just a 
single independent change of variable for each t we 
assume that the correlation structure is unchanged 
as you move away from the likelihood peak. 

Often a very good approximation to the cur¬ 
vature matrix is its ensemble average, the Fisher 
matrix, F = (F). Below, unless mentioned oth¬ 
erwise, we use the Fisher matrix in place of the 
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curvature matrix. However, we will see that in our 
application to the Saskatoon data, the differences 
between the curvature matrix and Fisher matrix 
can be significant. 


2.2.2. The Equal Variance Approximation 


In this subsection we consider an alternate form 
for the likelihood function £ oc P(A\Cg) that is 
sometimes a better approximation than the offset- 
lognormal form. The approximation is exact in 
the limit that the observations can be decomposed 
into modes that are independent, with equal vari¬ 
ances. For example, for a switching experiment in 
which the temperature of G pixels are measured, 
with the same noise <jn at each point, and such 
that each pixel is far enough from the others that 
there is no correlation between the points, the like¬ 
lihood can be written as 


In £ = -G/2 



E A/g 


' N J 


(16) 


where is such that Cx,ij = cr^Sij and di are the 
pixel temperatures. The independent pixel ideal¬ 
ization was very close to the case for the OVRO 
experiment, and, as we show in Section 6, the 
calculated likelihood is well approximated by this 
equation. The maximum likelihood occurs at a 
signal amplitude <7 t which is related to the data 
by = E ^ f / G - <t 2 n . 

If we define Z = In (oj, + x ), Z = In (erf- + x ), 
and x = o jy then we can rewrite Eq. 16 in a form 
that will be useful for relating it to the previous 
offset- lognormal form: 


ln£/£ 



(l-(Z-Z))jl.7) 


Note that if we consider only a single £ then the 
above form applies to Eq. 5 as well, with G = 2£+l 
and Z = ln(Cg + xg). We know this should be 
the case since the likelihood of Eq. 5 (for a single 
t) is also one for independent modes (aim) with 
equal variances (Cg + xg). If we fix G and x for 
each mode (e.g., band of £), we refer to this as the 
“equal variance approximation.” Also note that 
the first term in the expansion of Eq. 17 in Z — Z 

is —G/2 (^Z — zj , which with the identification, 

G = 2Tl z \ is the offset-lognormal form. Thus 
when the modes have equal variance and are inde¬ 
pendent, then the offset lognormal form is simply 


the first term in a Taylor expansion of the equal- 
variance form. An advantage of the full form is 
that the asymptotic form — (G/2)(Z—Z) linear in 
Z for large signal amplitudes holds (and thus gives 
a power law rather than exponential decay in £), 
whereas the offset lognormal is dominated by the 
Z 2 . An advantage of the offset-lognormal form is 
that it does not require the existence of equal and 
independent modes. Figure 2 shows that for the 
range of relevance for the likelihoods for DMR, the 
offset lognormal and the equal/independent vari¬ 
ance likelihood approximations are quite close over 
the dominant 2-sigma falloff from maximum. We 
have found this to generally be true. 

For either form, three quantities need to be 
specified, the noise-related offset x, Z and G. 
Given x, Z is determined from the maximum likeli¬ 
hood and G can be determined from the curvature 
of the likelihood. One could also specify the am¬ 
plitudes C at three points, e.g., at the maximum 
and the places where £/£ falls by e -1 / 2 , the upper 
and lower one-sigma errors if the distribution were 
fit on either side by a Gaussian. Forcing the ap¬ 
proximation to pass through these points enforces 
values of x, Z and G. 

In Section 8, we apply these approximations 
to power spectrum estimation from current data 
for which the practice has been to quote a signal 
amplitude with upper and lower one-sigma errors, 
say C. C u and C,i- Often these are Bayesian esti¬ 
mates, determined by choosing a prior probability 
for C and integrating the likelihood. Sometimes 
the e -1 / 2 points are given, which are slightly easier 
to implement in fitting for x and G. Since the tail 
of Eq. 17 is quite pronounced, resulting in a dra¬ 
matic asymmetry in £ between the up and down 
sides of the maximum even in the Z variable, we 
have found that just using the second derivative of 
the likelihood or the Fisher matrix approximation 
to it to fix G is not as good as assuming the offset 
lognormal and requiring that the functional forms 
match at the uppe r e -1 / 2 point. Thus, if the er¬ 
ror ac = 1/V pG) is from the curvature or Fisher 
matrix, then we prefer the choice 


G = [e" CTZ - (1 - a z ) 


-l 


oz = 


C + x 


1 


y/FW 

(18) 


rather than the curvature form G = 2/a 2 z , or the 
[cosh(az) — l]” 1 average of the ±1/2 widths. This 
is what was done in Fig. 2, and in all subsequent 
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figures. 

Fig. 2 shows that a linear —GZ/2 asymptote in 
the log-likelihood is not always correct and some¬ 
times the lognormal does better. That the tail 
often declines faster can be understood in terms 
of an effective number of modes G(C -1 ) which 
increases as C increases. To demonstrate this, 
it is useful to consider the likelihood behavior 
for “signal-to-noise” eigenmodes, which are linear 
combinations of the pixelized data which make 
them statistically independent for Gaussian sig¬ 
nals and noise. The data is then characterized by 
observed amplitudes dk , with a noise contribution 
transformed to give unity variance, and a signal 
contribution with amplitude CXk , in terms of a 
“signal-to-noise” eigenvalue A k and an overall am¬ 
plitude C. Such transformations have been much 
discussed in the literature (e.g., Bond (1994);Bunn 
& Sugiyama (1995);Bunn & White (1997);Bond, 
Jaffe & Knox (1998)), and we will not go into the 
details here; for more details on using this formal¬ 
ism to examine the overall form of the likelihood 
function, see Appendix A. 

The cases in which we expect Eq. 17 to be a 
good approximation are those in which the eigen¬ 
modes have a broad region over which A varies 
slowly and a very rapid falloff towards zero be¬ 
yond. This is exact for the independent pixel 
points described above, with A = a\la\ ! the same 
for all G inodes. If there were a number of fre¬ 
quency channels as well as pixels, only the lin¬ 
ear combinations which are flat in thermodynamic 
temperature have this A, and the rest are zero. 
For cases where the equal variance approximation 
is not exact, the signal-to-noise modes will have 
different eigenvalues A. One might then take an 
effective G to be the number of modes with A > 1 
or some other cutoff (since these are the ones that 
have greater signal than noise). However, then G 
grows as Cf increases, altering the power-law tail. 

Thus, the very general approach of “signal-to- 
noise” eigenmodes has allowed us to understand 
that the simple law with an effective G will usually 
overshoot the high C tail somewhat, even though 
it fits very well to 1-sigma, and usually beyond. 
The offset lognormal form, motivated by it, could 
err on either side, since it would presuppose a 
specific sort of increase in the number of eigen¬ 
modes contributing. Fortunately either approxi¬ 
mation seems to work well enough to allow accu¬ 


rate parameter estimation from a very small set of 
numbers. 


3. Application to COBE/DMR 

We first apply these methods to the anisotropy 
measurements of the DMR instrument on the 
COBE satellite (Bennett et al. 1996). The DMR 
instrument actually measured a complicated set of 
temperature differences 60° apart on the sky, but 
the data were reported in the much simpler form 
of a temperature map, along with appropriate er¬ 
rors (which we have expanded to take into account 
correlations generated by the differencing strategy, 
as treated in Bond (1994), following Lineweaver & 
Smoot (1993). The calculation of the theoretical 
correlation matrix includes the effects of the beam, 
digitization of the time stream, and an isotropized 
treatment of pixelization, using the table given by 
Kneissel & Smoot (1993), modified for resolution 
5. We use a weighted combination of the 31, 53 
and 90 GHz maps. Because most of the infor¬ 
mation in the data is at large angular scales, we 
use the maps degraded to “resolution 5” which 
has 1536 pixels. Further, we cannot of course ob¬ 
serve the entire CMB sky; we use the most recent 
galactic cut suggested by the COBE/DMR team 
(Bennett et al. 1996), leaving us with 999 pixels 
to analyze. We use the galactic, as opposed to 
ecliptic, pixelization. 

Before we can apply our procedure to COBE/DMR, 
we must discuss how to deal with the partial sky 
coverage of any real CMB experiment. To a good 
approximation, the COBE/DMR Fisher matrix 
can be written as (e.g., Knox (1995);Jungman et 
al. (1996);Bond, Jaffe & Knox (1998);Hobson & 
Magueijo (1996)) 


~ / sk y 


2 £ + 1 


Ci- 


1 ) 


1 —2 


27 xwB'j 


5w- (19) 


In the language of our new procedure, this means 
that we still expect to be able to approximate 
the likelihood as a Gaussian in the same Zi = 
ln(C^ + xe), but now we can only approximate 
the term xi ~ £(£ + l)/[27 twB 2 (£)\, where w is 
the weight per solid angle of the experiment. In 
terms of the total weight, W, of the experiment, 
w = W/(47r/ sky ). A more detailed approxima¬ 
tion for a particular experiment might be possible, 
but as we will see below, this expression does ex- 



tremely well in reproducing the full non-Gaussian 
likelihood. 

We have calculated the maximum-likelihood 
power spectrum and its error (Fisher) matrix us¬ 
ing the quadratic estimator procedure of (Bond, 
Jaffe & Knox 1998). With knowledge of the 
COBE/DMR beam (Bennett et al. 1996) along 
with the noise properties of the experiment, we 
can calculate the necessary quantity xt- For 
COBE/DMR, we have an average inverse weight 
per solid angle of w _1 = 9.5 x 10 -13 (equivalent 
to an RMS noise of 22 /rK on 7° x 7° pixels). 

With these numbers, we show the full likelihood 
in comparison to the “naive Gaussian” approx¬ 
imation, as well as our offset lognormal ansatz. 
While the naive Gaussian approximation consis¬ 
tently overestimates the likelihood below the peak 
and underestimates it above the peak, the lognor¬ 
mal form reproduces the full expression extremely 
well in both regimes. 

The Gaussian form of the offset lognormal form 
makes using the power spectrum estimates for pa¬ 
rameter estimation very simple: we evaluate a \ 2 
in the quantity Zt rather than Ct (although the 
model is now nonlinear in the spectral parame¬ 
ters) . 

Again, we see how well our offset lognormal 
ansatz performs; it reproduces the peak and er¬ 
rors on the parameters. In particular, it elim¬ 
inates the “cosmic bias” discussed above, find¬ 
ing essentially the correct amplitude, <Ts for each 
shape probed, unlike the naive Gaussian in Ct, 
which consistently underestimates the amplitude. 
Of course, far from the likelihood peak, even the 
offset lognormal form misrepresents the detailed 
likelihood structure since no Gaussian correctly 
represents the softer tails of the real distribution, 

_l/2 

which goes asymptotically as the power law C t ' ; 
the offset lognormal approximation is asymptoti¬ 
cally lognormal with a much steeper descent; the 
equal-variance form can in principle reproduce the 
asymptotic form better. This behavior can be im¬ 
portant for the case of upper limits, i.e., when the 
likelihood peak is at Ct = 0. We discuss this spe¬ 
cial case in Section 6. 



Fig. 2.— Full and approximate COBE/DMR like¬ 
lihoods P(A\Ce) for various values of i, as marked. 
The horizontal axis is Ct = t(l + 1 )C//( 27 t). The 
upper right panel gives the cumulative probability. 
The solid (black) line is the full likelihood calcu¬ 
lated exactly. The short-dashed (red) line is the 
Gaussian approximation about the peak. The dot¬ 
ted (cyan) line is a Gaussian in In Ct ; the dashed 
(magenta) line is a Gaussian in In (Ct + xt), as dis¬ 
cussed in the text. The dot-dashed (green) line is 
the equal-variance approximation. 
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4. General Treatment 



Fig. 3.— Exact and approximate likelihood con¬ 
tours for COBE/DMR, for the cosmological pa¬ 
rameters n s and as (with otherwise standard 
CDM values). Contours are for ratios of the like¬ 
lihood to its maximum equal to exp— i / 2 /2 with 
v = 1,2,3. Upper panel is for the full likelihood 
(dashed) and its offset lognormal approximation as 
a Gaussian in In (Cz + xz) (solid; see text); lower 
panel shows the full likelihood and its approxima¬ 
tion as a Gaussian in Cz- 


4.1. Chopping Experiments 

We wish to generalize this procedure to the case 
of experiments that are not capable of estimating 
individual multipole moments and/or chopping 
experiments. By chopping experiments, which 
have been the norm until very recently, we mean 
those that rather than report the temperature at 
various positions on the sky, report more compli¬ 
cated linear combinations, with a sky signal given 

by 

d 2 x Hi(x) — (x) = Y H i,trnazm (20) 

Zm 


for some beam and switching function H{x)\ Hi^m 
and cizm are the spherical-harmonic transforms of 
H and the temperature, respectively. This induces 
a signal correlation matrix given by 

Crw = (BiSi.) = Ji0Y) Wii ' (21) 


Here, the window function matrix, gener¬ 

alizes the beam Bf of a mapping experiment and 
is given by 




47T 

2 £+ 1 




inn 


( 22 ) 


(this should not be confused with the “window 
function,” given by Wz = Yi Wu(£)/N pix .) More¬ 
over, for many experiments, the noise structure 
can be considerably more complicated, and may 
not be reducible to a simple noise correlation func¬ 
tion or power spectrum (that is, correlations in the 
noise may not just be functions of the distance be¬ 
tween points); instead, we may have to specify a 
general noise matrix 


CnH' — ( TliTli ') (23) 


How can we generalize our previous procedure 
to account for this more complicated correlation 
structure? We will take the general offset lognor¬ 
mal form of the likelihood, a Gaussian in Zz = 
ln(C^ + xz), as our guide. We have already noted 
that in the case of incomplete sky coverage, or in¬ 
homogeneous noise, Eq. 12 has no solution. Thus 
we are only searching for a reasonable ansatz to 
try for X£. 
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We begin by noting that xt represents the noise 
contribution to the error. For the full-sky case the 
ratio Ct/xt is the ratio of the signal contribution 
to the error to the noise contribution to the error: 


Ct/xt 


( F i 


- 1 / 2 ' 


( F u 1/2 


) signal 
) noise 


(24) 


expect that the correlation matrix at the likeli¬ 
hood peak would give the best value for xb-) 
Alternatively, we sometimes use Eq. 26 but 
make use of the approximation 

Tr (Cf'CT'BC^CT'B) = £( 2£ + 1 )/ sky (29) 

l£B 


since (F u 1/2 ) s i g „ a i = \/2/(2£+ 1 )C t and {F u 1/2 ) no ise = 
yj2/(ll+ l)Ni/Bj = y/2/(2l+l)xt. Writing 
Ct/xt in Eq. 24 in terms of Fisher matrices allows 
us to generalize to arbitrary experiments. 

Before writing down the general procedure, we 
must introduce a little more notation. Instead 
of estimating every C(, we estimate the binned 
power spectrum Cb = E^gs &/ E^pb 1> i n ^ins 
labeled by B. Let Ct,b be the contribution to 
the signal covariance matrix from bin B , i.e., 
Cr,Bij = YleeB^ B Wij(£)/£. The Fisher matrix 
for Cb (whose inverse gives the covariance matrix 
for the uncertainty in Cb) is given by 

F%1 = Tr (C~ 1 Ct,bC~ 1 Ct,b / ) /(C b C b >) (25) 

where C is the total covariance matrix, C = Ct + 

Cn ■ Of course, in the limit of no noise, C = Ct 
and in the limit of no signal, C = Cn- Thus Eq. 24 
generalizes to 


Cb/xb 


Tr {C^Ct,bC^C t ,b) 
Tr (C^Ct,bC^Ct.b) ' 


(26) 


Evaluation of the denominator of Eq. 26 is 
sometimes difficult, as practical shortcuts in the 
calculation of the window function matrix may 
make it singular or give it negative eigenvalues. 
To avoid this calculation, we sometimes generalize 
the expression for xe by noting that for a homoge¬ 
neously sampled, full-sky map: 


(p 1 /2'i . _ p 1/2 _ /p l/2\ 

y-T ££ ) signal — \ r il /noise 


(27) 


and therefore replace Eq. 26 with 
Cb/xb = 


/Tr {C^Ct,bC^C t ,b) 
Tr(C- 1 C T ,BC- 1 C T ,B) 


-1. (28) 


(For the all-sky, uniform noise case, Xb thus de¬ 
fined will be independent of Cb] in a realistic ex¬ 
periment this will no longer hold. In practice, we 


which is exact for maps in the limit / s k y —> 1. 

To summarize, our offset lognormal ansatz is to 
take Zb = ln(Cs + Xb) as Gaussian distributed, 
with xb calculated from Eq. 28 and covariance 
matrix given by the inverse of 

F ( b % = ZbZb'F^I no sum. (30) 

Alternately, we can use these same quantities in 
the equal-variance form of Eqs. 16-18. 


4.2. Bandpowers 

Most observational power spectrum constraints 
to date are reported as “band-powers” rather than 
as estimates of the power in a power-spectrum bin, 
as we have been assuming above. These band- 
powers are the result of assuming a given shape 
for the power spectrum and then using one par¬ 
ticular modulation of the data to determine the 
amplitude. With Ct thus fixed, the band-power, 
C B p, is given by 


Eew^WeCe ^W e C e /£ 
^ E tT$T>W t ~ E t W t /l 


(31) 


with We given by the trace of the window function 
matrix. To find Xbp, replace Ct,b with Cp in 
Eq. 28. 

Note that in order to compare this observationally- 
determined number, Cbp, with other theories, 
specified by different C;s with different shapes and 
amplitudes, we need a means for calculating the 
expected value of Cbp, (Cbp), given an arbiratry 
Ci. It has been often assumed that (Cbp) for ar¬ 
bitrary Ci is also given by the right-hand side of 
Eq. 31. However, this is only strictly true in the 
case of a diagonal signal covariance matrix. The 
generalization to the case of non-diagonal signal 
matrices is discussed in Bond, Jaffe & Knox (1998) 
and Knox (1999). 
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4.3. Linear Combinations 

Nothing we have derived so far restricts us to 
the likelihood as a function of Ce per se; any other 
measure of amplitude will also have a likelihood in 
this form. That is, we write a general amplitude 
as 

( 32 ) 

e 

for some arbitrary filter or filters, fi(£), and Ce = 
£(£ + l)C//(27r) as usual. This filter could be, for 
example, one designed to make the uncertainties 
in the af uncorrelated, as in the following section. 

What is the likelihood for this amplitude, 
rather than Ce at a single 11 We first change 
variables from Ce to cr, as in Eq. 11. If we choose 
window functions that do not overlap in £, the 
inverse Fisher matrix then becomes 

( 33 ) 

w 

If the original (Ce) Fisher matrix has the simple 
form of Eq. 6, then we see that we can just filter 
the individual terms in any of the ensuing equa¬ 
tions with the same /, and our ansatz will still 
hold. Explicitly, we would expect the variable 


Zi 


In 


a i + M £ ) x t 

t 


(34) 


to be distributed as a Gaussian for the offset log¬ 
normal form; for the equal-variance form the gen¬ 
eralization is clear. 


4.4. Orthogonal Bandpowers 

We can apply this to a particularly useful set 
of linear combinations, the so-called orthogonal 
bandpowers (Bond, Jaffe & Knox 1998; Tegmark 
1997; Hamilton 1997a,b; Tegmark & Hamilton 
1998). If we have a set of spectral measurements 
Cb in bands B with a weight matrix Mbb' = 
Fbb' = (SCbSCb '), we can form a new set of mea¬ 
surements which have a diagonal error matrix by 
applying a transformation like Db = M bb ,Cb' = 
F b ' B iCb'- The 1/2 power represents any matrix 
such as the Cholesky decomposition or Hermi- 
tian square root which satisfies A 1 / 2 (A 1 / 2 ) T = A. 
These linear combinations will have the property 
that ( SDbSDb ') = Sbb 1 (note the similarity to 


the calculations of Section 2.2). For calculating 
the “naive” quantity, 

X 2 — ^(C_b — Cb)M B b'(Cb' — Cb') (35) 

BB' 

= ^CDb-Sb) 2 , (36) 

B 

(hats here refer to observed quantities) these or- 
thogonalized bands don’t change the results. How¬ 
ever, because the error and Fisher matrices are 
both diagonal our likelihood ansatz can be applied 
very cleanly, since the off-diagonal correlations are 
zero and so we might expect them to represent the 
exact shape of the likelihood around the peak more 
accurately. Now, we will take the quantity Cb = 
ln(£>B + Vb) to be distributed as a Gaussian with 
correlation matrix (5(,b&C,b') = (Db + Ub) 2 Sbb' ■ 
This again involves the further approximation that 
the correlation structure far from the peak remains 
given by that near the peak (encoded in the cur¬ 
vature matrix of the distribution at the peak). 

From the previous subsection, we further know 
that if we have a set of quantities xb appropriate 
for approximating the Cb likelihood (as in Eq. 28), 
then we should be able to set ys = F bb ,xb'- 
(Note that we can also use these quantities in the 
equal-variance approximation, which does not oth¬ 
erwise have a simple multivariate generalization.) 

We use these orthogonalized bandpower results 
for the cosmological parameter estimates using the 
SK data in the following sections. 

5. Application to Saskatoon 

We apply this ansatz to the Saskatoon experi¬ 
ment, perhaps the apotheosis of a chopping exper¬ 
iment. The Saskatoon data are reported as com¬ 
plicated chopping patterns (i.e., beam patterns, 
H, above) in a disk of radius about 8° around 
the North Celestial Pole. The data were taken 
over 1993-1995 (although we only use the 1994- 
1995 data) at an angular resolution of 1.0-0.5° 
FWHM at approximately 30 GHz and 40 GHz. 
More details can be found in Netterfield et al. 
(1997) and Wollack et al. (1997). The combina¬ 
tion of the beam size, chopping pattern, and sky 
coverage mean that Saskatoon is sensitive to the 
power spectrum over the range £ = 50-350. The 
Saskatoon dataset is calibrated by observations of 
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supernova remnant, Cassiopeia- A. Leitch and col¬ 
laborators (Leitch 1998) have recently measured 
the flux and find that the remnant is 5% brighter 
than the previous best determination. We have 
renormalized the Saskatoon data accordingly. 

We calculated Cn for this dataset in Bond, Jaffe 
& Knox (1998). We combine these results with 
the data’s noise matrix to calculate the appropri¬ 
ate correlation matrixes (in this case, the full cur¬ 
vature matrix) for Saskatoon and hence the appro¬ 
priate xb (Eq. 28) and thus our approximations to 
the full likelihood. In Figure 4, we show the full 
likelihood, the naive Gaussian approximation, and 
our present offset lognormal and equal-variance 
forms. Again, both approximations reproduce the 
features of the likelihood function reasonably well, 
even into the tails of the distribution, certainly 
better than the Gaussian approximation. They 
seem to do considerably better in the higher-^ 
bands; even in the lower £ bands, however, the ap¬ 
proximations result in a wider distribution which 
is preferable to the narrower Gaussian and its re¬ 
sultant strong bias. Moreover, we have found that 
we are able to reproduce the shape of the true like¬ 
lihood essentially perfectly down to better than 
“three sigma” if we simply fit for the Xb (but of 
course this can only be done when we have already 
calculated the full likelihood precisely what we 
are trying to avoid!). For existing likelihood cal¬ 
culations, this method can provide better results 
without any new calculations (see Appendix B for 
our recommendations for the reporting of CMB 
bandpower results for extant, ongoing, and future 
experiments). 

We also show the usefulness of the offset lognor¬ 
mal form in cosmological parameter determination 
in Figure 5, for which we use the orthogonalized 
bandpowers discussed above. As expected, it re¬ 
produces the overall shape of the likelihood func¬ 
tion quite well, although it does better for a fixed 
shape (n s in this case). 

We have found that the shape of the power 
spectrum used with each bin of t can have an 
impact on the likelihood function evaluated us¬ 
ing this ansatz. Similarly, a finer binning in i 
will reproduce the full likelihood more accurately. 
Although the maximum-likelihood amplitude at a 
fixed shape (n s ) does not significantly depend on 
binning or shape, the shape of the likelihood func¬ 
tion along the maximum-likelihood ridge changes 



Fig. 4.— Full and approximate Saskatoon like¬ 
lihoods. As in Fig. 2. The solid (black) line is 
the full likelihood calculated exactly. The short- 
dashed (red) line is the Gaussian approximation 
about the peak. The dotted (cyan) line is a Gaus¬ 
sian in lnC^; the dashed (magenta) line is a Gaus¬ 
sian in In (Ce + xe), as discussed in the text. The 
dot-dashed (green) line is the equal-variance ap¬ 
proximation. 
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with finer binning and with the assumed spectral 
shape. 

As an aside, we mention several complications 
that we have noted in the analysis of the Saska¬ 
toon data. Because of the complexity of the Saska¬ 
toon chopping strategy, we have found that the 
signal correlation matrix, Ct is not numerically 
positive definite; removing the negative eigenval¬ 
ues can change the value of Cb by as much as 5% 
in some bins. This should be taken as an estimate 
of the accuracy of our spectral determinations due 
to these numerical errors. 

We have also found that the Fisher matrix, 
which we usually use as an estimate of the (in¬ 
verse) error matrix for the parameters, can dif¬ 
fer significantly from the true curvature matrix. 
This difference can be especially marked in low-f 
bins for which the sample and/or cosmic variance 
can be considerable, potentially resulting in large 
fluctuations in this error estimate as well. In the 
Saskatoon plots here, we use the actual curvature 
matrix in place of the Fisher matrix. In a forth¬ 
coming paper (Knox & Jaffe 1998), we will ad¬ 
dress these and other issues of implementation of 
the quadratic estimator for Ce- 

As we will see in the following, these concerns 
become less important when combining Saskatoon 
with COBE/DMR, since the results are mostly de¬ 
pendent on the broad-band power probed by each 
experiment. Moreover, we expect that these diffi¬ 
culties are considerably more likely in the case of 
chopping experiments, for which our expression for 
Xb, Eq. 26 is somewhat ad hoc. Most future CMB 
results will be for “total-power” (i.e., mapping) ex¬ 
periments, and the satellites MAP and Planck will 
be (nearly) all-sky, like COBE/DMR, for which 
the offset lognormal form has proven most excel¬ 
lent. In any case, even with present-day data, our 
ansatz provides a far better approximation to the 
full likelihood than a simple Gaussian in Ce as was 
used for some global analyses of current CMB data 
such as Lineweaver (1997); Lineweaver (1998a); 
Lineweaver (1998b);Lineweaver (1998c);Lineweaver 
& Barbosa (1998);Hancock et al. (1998)). 

6. Application to OVRO, SP and SuZIE 

One of the problems we hoped to solve with 
better approximations to the likelihood functions 
than Gaussian was how to treat the valuable data 


with upper limits or very weak detections. In par¬ 
ticular, the data from OVRO (Myers, Readhead & 
Lawrence 1993) and SuZIE (Church et al. 1997) is 
useful for constraining open universe models with 
power spectra that do not fall off rapidly enough at 
high £. Although the Gaussian form does not work 
well here, the offset lognormal does much better 
and the form of Section 2.2.2 works very well, as 
is shown for OVRO and SP in the top panel of 
Fig. 6. 

The likelihood for the SuZIE results is also 
shown. The authors (Church et al. 1997) plotted 
the likelihood for the amplitude for several differ¬ 
ent models, which we have fit from the published 
figure. Although reported as an upper limit, the 
likelihood is peaked at positive power, but zero 
power is only rejected at ~ ler. We note as an aside 
that a simple flat bandpower (Ce = const) is not 
quite sufficient to contain all of the information in 
the SuZIE data: the likelihood function changes 
slightly for models with different shapes, defined 
as in Eq. 31 —most of the physically motivated 
models (e.g., sCDM, ACDM, etc.) have roughly 
the same bandpower curves, but a flat bandpower 
gives a slightly different one as shown in the fig¬ 
ure, and extreme open models are more similar to 
the flat-bandpower case than to sCDM. We also 
note that our equal-variance approximation per¬ 
forms slightly better for the flat bandpower, while 
the sCDM model is fit better by the offset lognor¬ 
mal. In any case, we again find that in all of these 
cases our approximations fit the likelihoods much 
better than any naive Gaussian approach would. 

7. Results: COBE/DMR + Saskatoon 

As a further example and test of these meth¬ 
ods, we can combine the results from Saskatoon 
and COBE/DMR in order to determine cosmo¬ 
logical parameters. For this example, we use the 
orthogonal linear combinations as described in the 
previous section. In Figure 7 we show the like¬ 
lihood contours for standard CDM, varying the 
scalar slope n s and amplitude as- As before, 
we see that the naive y 2 procedure is biased to¬ 
ward low amplitudes at fixed shape (n s ), but that 
our new approximation recovers the peak quite 
well. The full likelihood gives a global maximum 
at (n s ,as) = (1.15,1.67), and our approxima¬ 
tion at (1.13,1.58), while the naive y 2 finds it at 
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(1.21,1.55), outside even the three-sigma contours 
for the full likelihood. We can also marginalize 
over either parameter, in which case the full like¬ 
lihood gives n s = 1.17lolo?> a s — l-SSlgJi; our 
ansatz gives n s = I.I 4 +Q 05 , as = 1.60 ± 0.15; and 
the naive y 2 gives n s = 1 . 21 +QQg, as = 1-55+oio- 
(Note that even with the naive y 2 w e marginalize 
by explicit integration, since the shape of the like¬ 
lihood in parameter space is non-Gaussian in all 
cases.) 

8. Parameter Estimation 


completely describes the model: 


Xcal 

Zf 

Zl 


'E(Z t i-Zt)M? j (Z* j 

hj 

ST' i u a ~ l ) 2 , 

~2 ’ 


Zf) + X%<$) 
(40) 


In (Hi + Xj); 



^a(l) fiBCB 4“ 


(41) 

(42) 


Mij (Di + Xi) (Dj + Xj) no sum;(43) 


Above, we have discussed many different ap¬ 
proximations to the likelihood C. Here we discuss 
finding the parameters that maximize this likeli¬ 
hood (minimize the y 2 = — 21n£). We then ap¬ 
ply our methods to estimating the power in dis¬ 
crete bins of £. This application provides another 
demonstration of the importance of using a better 
approximation to the likelihood than a Gaussian. 


The likelihood functions above depend on Cp 
which may in turn depend on other parameters, 
o p , which are, e.g., the physical parameters of a 
theory. If we write the parameters as a p + Sa p we 
can find the correction, 6a p , that minimizes y 2 by 
solving 


Sa p — 


2 PP'ddpZ 


(37) 


where 

= 1 d 2 x 2 

pp 2 da p da p > 


(38) 


is the curvature matrix for the parameters a p . 
If the x 2 were quadratic (i.e., Gaussian C) then 
Eq. 37 would be exact. Otherwise, in most cases, 
near enough to its minimum, y 2 is approximately 
quadratic and an iterative application of Eq. 37 
converges quite rapidly. The covariance matrix 
for the uncertainty in the parameters is given by 
( 5a p Sa p /) = £F~ pl . This is just an approximation 
to the Newton-Raphson technique for finding the 
root of dC/da p = 0; a similar techniqe is used in 
quadratic estimation of Cp (Tegmark 1997; Bond, 
Jaffe & Knox 1998; Oh, Spergel & Hinshaw 1998). 

As our worked example here, we parameterize 
the power spectrum by the power in B = 1 to 
11 bins, Cb■ Within each of the bins, we assume 
Cp = Cb to be independent of l. We have cho¬ 
sen the offset lognormal approximation. The % 2 


where Af. i? is the weight matrix for the band pow¬ 
ers Dp 2 . We have modeled the signal contribu¬ 
tion to the data, Dp, as an average over the power 
spectrum, times a calibration param¬ 

eter, u a (j). For simplicity, we take the prior prob¬ 
ability distribution for this parameter to be nor¬ 
mally distributed. Since the datasets have already 
been calibrated, the mean of this distribution is at 
u a = 1. The calibration parameter index, a, is a 
function of i since different power spectrum con¬ 
straints from the same dataset all share the same 
calibration uncertainty. We solve simultaneously 
for the u a and Cb', i.e., together they form the 
set of parameters, a p , in Ecp 37. For those ex¬ 
periments reported as band-powers together with 
the trace of the window function, Wl , the filter is 
taken to be 




Note that this is an approximation which ne¬ 
glects some effects of off-diagonal signal correla¬ 
tions (Knox 1999). 

For Saskatoon and COBE/DMR, our Di are 
themselves estimates of the power in bands. For 
these cases the above equation applies, but with 
Wp/£ set to a constant within the estimated band 
and zero outside. The estimated bands have dif¬ 
ferent £ ranges than the target bands. 

Instead of the curvature matrix of Eq. 38 we use 
an approximation to it that ignores a logarithmic 
term. Including this term can cause the curvature 
matrix to be non-positive definite as the iteration 
proceeds. The approximation has no influence on 


2 In most cases, it is more precisely an estimate of the weight 
matrix based on, e.g., 68% confidence upper and lower lim¬ 
its. For more details, see Appendix C. 
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our determination of the best fit power spectrum, 
but does affect the error bars. We have found that 
the effect is quite small. 

We now proceed to find the best-fit power spec¬ 
trum given different assumptions about the value 
of Xi , binnings of the power spectrum and editings 
of the data. See Appendix C for a tabulation of 
the bandpower data we are using. 

We have determined the x, only for COBE/DMR, 
Saskatoon, SP89, 0VR07, SuZIE and TOCO. 
(Although not included in Table 2, the highly con¬ 
straining measurement near l = 400 by T0C098 
(Miller et al. 1999) is also well-fit by the offset 
lognormal form.) To test the sensitivity to the 
unknown XiS we found the minimum-y 2 power 
spectrum assuming the two extremes of aq = 0 
(corresponding to lognormal) and Xi = oo (corre¬ 
sponding to Gaussian). These two power spectra 
are shown in Fig. 8. Note that both power spectra 
were derived using our measured Xi values; only 
the unknown x t values were varied. The variation 
in the results would be much greater if we let these 
Xi values be at their extremes. In what follows the 
unknown x t are set to zero. 

Sometimes when the entire likelihood function 
is unavailable, enough information is given to al¬ 
low an estimation of the x t for individual experi¬ 
ments that give single bandpower estimates. For 
example, we may be given D m where the likeli¬ 
hood is a maximum and D± where the likelihood 
has fallen by exp(—1/2), the latter giving an esti¬ 
mate of asymmetric 1 sigma error bars. We can 
then estimate the error cr and the value of x in the 
offset lognormal form from: 

a = ln(s+/s_), 

x = D m [s + s_/(s+ - s_) - 1] , (45) 

where 

D ± = D m { 1 ± s±) (46) 

defines s±. Note that x is indeterminant for s + = 
s_, but this Gaussian limit is approached from the 
large x direction. It yields an error on D — D rn 
of xa —> D m (s+ + S-)/ 2, the required Gaussian 
error. 

Although the determination of the effective er¬ 
ror is quite stable as s_ —> s + , this is not so for 
x , where small errors in s_,s+ can lead to big 
changes. The x > 0 constraint implies s_s+ > 


(s+ — s_), which can still be violated though 
s + > S-. Even if this does occur, the lognormal 
form may still be a reasonable fit if one adjusts x 
and a. On the other hand, if the probability is 
skewed towards small amplitudes, as can happen 
when there is another signal such as a dust com¬ 
ponent that has been marginalized, the lognormal 
fit can never work. 

Most often, the reason we cannot get good val¬ 
ues of x from asymmetric error bars is that the 
limits quoted in the literature are the Bayesian 
integral 16% and 84% probability ones, with an 
assumed prior (e.g., uniform in the bandpower am¬ 
plitude). We need to know the likelihood shape to 
estimate x from those s±. (One could adopt the 
lognormal and fit for these integral quantities, a 
step we have not taken.) For example, We have 
found that adopting Eq. (45) leads to negative val¬ 
ues for x for Tenerife, MAX4 and MAX5 values 
taken from the literature. 

Except for the cases noted, we always use x’s for 
the experiments where we have determined them 
well, those given in Table C2. When we must re¬ 
sort to the Eq. (45) procedure to estimate them, 
we usually set all ads to zero if they are nega¬ 
tive. For the positive values, we have tested the 
Eq. (45) x values with our x = 0 results and find 
that it makes little difference in the compressed 
bandpower results. 

Our results have some sensitivity to binning, 
partly because the visual representation does not 
describe the correlation between bins. This is es¬ 
pecially so for the last bin for which upper limits 
play a large role. A procedure often used to con¬ 
trol such variations is to introduce a prior prob¬ 
ability Pp(C-B ) = exp[S'p(Cs)] for the Cb■ In 
this section, we have implicitly adopted a uniform 
prior, which is the least prejudiced one to adopt in 
that only the data decides what the bandpowers 
should be. However, we have also experimented 
with other priors, such as a Gaussian priors, with 
Sp oc —( Cb/Cb * — l) 2 /( 2 <jp) and a “maximum 
entropy” prior, 

Sp = olCb* x 

[— {Cb/Cb*) In (Cb/Cb*) + {Cb/Cb* — 1)]- 

(47) 

Here Cb* is some target value, associated with an 
assumed prior model, Up is an adjustable variance 
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in the Gaussian case and a is an adjustable param¬ 
eter in the maximum entropy case. For example, 
the maximum entropy prior is designed to avoid 
negative Cb , relaxes to Cb* where there is little 
data and ensures some degree of smoothness in 
the determinations. However, especially in regions 
where the input shape for Cb* is changing rapidly 
(e.g., the damping tail), this procedure can give 
the wrong impression. Given the current state 
of the data, we prefer to show a few binnings to 
demonstrate sensitivity. When the data improves, 
it will be reasonable to try other priors, for exam¬ 
ple those that exert penalties if the data does not 
give continuity in t. 

The x 2 (°f Eq. 39) for the fit in Table 1 is 62 
for 63 degrees of freedom. Thus the scatter of the 
band powers is consistent with the size of their 
error bars; it provides no evidence for contamina¬ 
tion, mis-estimation of error bars, or severe non- 
Gaussianity in the probability distribution of the 
underlying signal. 

In choosing a particular binning, there is a 
tradeoff to be made between preserving shape in¬ 
formation and reducing both error bars and cor¬ 
relation. From Table 1 one can see the extent 
to which the bins are correlated. The nearest- 
neighbor bin is by far the dominant off-diagonal 
term. For this particular binning, all others are 
ten percent or less, except for the ninth bin— 
eleventh bin correlation which is 0.2. The lower l 
bands have the smallest correlations as we would 
expect from DMR. There are some very strong cor¬ 
relations in the higher bands. Fortunately, from 
Fig. 9 we see that some general features are quite 
robust under different binnings. Namely, the spec¬ 
trum is flat out to £ ~ 80 or 100, there is a rise 
to i ~ 250 or 350 and then a drop in power be¬ 
yond l ~ 350. Although the data clearly indicate 
a peak, it is difficult to locate the position to bet¬ 
ter than ±70. In the top panel the rise to the 
Doppler peak has been binned more finely than 
the others. This is a particularly difficult area to 
resolve with current data: the correlation between 
the sixth and seventh bins is —0.75. 

We also see, from Figure 10, that the general 
picture does not depend on one single dataset— 
though the error bars do get significantly larger 
when the Saskatoon dataset is ignored. Also, if 
we were to ignore OVRO, the highest £ bin would 
have error bars larger than the graph. 
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Fig. 5.— Likelihood contours for the Saskatoon 
experiment alone, as in Figure 3, but using the 
“orthogonalized bands” of Sec. 4.4. 
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5 
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566 
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799 
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11 

15 
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-0.11 

16 

39 

907 

313 

-0.17 

40 

99 

1027 

331 

-0.32 

100 

169 

3205 

610 

-0.16 

170 

249 

6216 

1040 

-0.15 

250 

399 

2394 

1106 

-0.62 

400 

999 

3328 

1218 

-0.31 

1000 

2999 

0.0 

783 



Table 1: Estimated binned power spectrum. The 
power and standard error are in /j, K 2 and are 
the numbers corresponding to the data points in 
Fig. 11. The correlation column lists the correla¬ 
tion (= £F BB ,/ yj Tbb-^b^b') between bin B and 
the next highest bin, B' = B ± 1. 
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Fig. 6.— Upper Panel: Likelihood curves for 
two upper limit cases, OVRO and SP89, and 
SP94 which had a reasonably well determined 
bandpower amplitude. Lower Panel: Likelihood 
curves for SuZIE, using two different models for 
C^, sCDM and a flat bandpower. Solid (black) 
lines are the full likelihoods. The equal-variance 
approximation (dashed curve) does extremely well 
and the offset-lognormal (dotted curves) also does 
well in treating these cases. The measure of the 
amplitude cr t h is proportional to C B ' . In the top 
panel, <7th = erg for the untilted sCDM model; in 
the bottom panel, it is in units of 10 5 AT/T. 



Fig. 7. - Likelihood contours for COBE/DMR 
and Saskatoon combined. As in Figure 3, but com¬ 
bining likelihoods from COBE/DMR and Saska¬ 
toon. For the Saskatoon calculation, we used the 
“orthogonalized bands” of Sec. 4.4. 
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Fig. 8.- Power spectra that minimize the x 2 
in Eq. 39. The solid (dashed) error bars assume 
x = 0 (x = oo) for those datasets with no de¬ 
termination of x; the two sets have been offset 
slightly for display purposes. Solid curve is stan¬ 
dard CDM. 
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Fig. 9.— Power spectra that minimize the x 2 i n 
Eq. 39 under different binnings. Solid curve is 
standard CDM. 


The upper limits from SuZIE and OVRO con¬ 
strain a region of the power spectrum otherwise 
unconstrained and put some pressure on models 
with small-scale power, such as open models. Due 
to the high interest in these limits, and the diffi¬ 
culty in interpreting the error bars in these figures 
(once again due to their non-Gaussianity) we have 
attempted to display their constraints on the spec¬ 
trum in an additional, independent manner. We 
do so by using the published bounds on Gaussian 
auto-correlation functions (GACFs). The GACF 
is given by 

Ci = c 0 e 2 e 2 c exp^e 2 e^ (48) 

where Co is the amplitude of the real-space cor¬ 
relation function at zero lag and 9 C is called the 
Gaussian coherence angle. For various choices of 
the coherence angle, the data were used to set lim¬ 
its on Co- The curves in Fig. 11 trace the peak of 
the GACF with Co at the 95% confidence upper 
limit as 9 C is varied. 

As we have emphasized earlier with the band- 
powers, it can often be misleading to interpret 
the covariance matrix of the parameters (derived 
from the Fisher matrix) as indicating the 68% con¬ 
fidence region, since the 68% confidence region 
may extend beyond the region of validity for the 
quadratic approximation to x 2 - Even in such a 
case though, the quadratic procedure still may be 
useful just for finding the minimum, which might 
be a good point to begin further investigation of 
the x 2 surface without the quadratic approxima¬ 
tion. Non-Gaussianity can be especially severe 
when the parameters are cosmological parameters. 

9. Summary and Discussion 

We have argued that cosmological parameters 
should be constrained from CMB datasets via 
an initial step of determining constraints on the 
power spectrum. These power spectrum con¬ 
straints can themselves be viewed as a compressed 
version of the pixelized data. We call this process 
radical compression since the resulting dataset is 
orders of magnitude smaller than the original. One 
must be careful in using this compressed data 
to take into account the non-Gaussian nature of 
their probability distributions; ignoring the non- 
Gaussianity while attempting to constrain param¬ 
eters results in a bias. The offset lognormal and 
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equal variance approximations capture its salient 
characteristics. They are both specified by the 
mode, variance and the noise contribution to the 
variance (a:). Use of these forms allows for a 
very simple y 2 type treatment of the band-power 
data—without the bias. 

While we have found these approximations to 
the likelihood functions to be quite adequate for 
dealing with the data we have explored to date, 
and have given quite general arguments for why 
the tails behave as they do, our checks have not 
been exhaustive. For example, some quoted CMB 
anisotropy results are skewed to lower rather than 
higher amplitudes (perhaps due to fitting out fore¬ 
grounds or systematics), a situation that the off¬ 
set lognormal cannot fit. Just as one computes 
the curvature about the maximum likelihood, so 
one can consider computing a skewness that would 
encapsulate such behavior, but we will leave the 
search for further likelihood function approxima¬ 
tions to further exploration. 

We have shown that the offset lognormal ap¬ 
proximation applied to a two-parameter (a$ and 
n s ) family of CDM models works very well. We 
have also used this form to find the maximum- 
likelihood binned power spectrum, given the band- 
power data. The resulting graphs provide a visual 
representation of the power spectrum constraints 
that is, in our opinion, far superior to plotting all 
the band-power data on top of each other. 

The exercise of estimating the binned power 
spectrum immediately raises the question of how 
well it would work to estimate cosmological pa¬ 
rameters using these as a “super-radically com¬ 
pressed dataset”. We plan to pursue this question 
in future work. 

Although our examples have focused on using 
our approximations in order to derive parameter 
constraints from more than one dataset, we be¬ 
lieve they may also prove useful for estimating 
cosmological parameters from single, very power¬ 
ful datasets, such as those that are expected to 
come from a number of experiments over the next 
decade. We must note though that once a dataset 
has sufficient “spectral resolving power” and dy¬ 
namic range there is another approach that can 
be used to remove the cosmic bias. This alter¬ 
native approach was suggested in Bond, Jaffe & 
Knox (1998) and Seljak (1997) and successfully 
applied to simulated MAP data in Oh, Spergel 


& Hinshaw (1998). The idea is to exploit the 
fact that we expect there to be no fine features in 
the CMB power spectrum and therefore use some 
smoothed version of the estimated power spectrum 
to calculate the Fisher matrix. Heuristically one 
expects this to remove the bias, since upward- 
fluctuating points no longer receive less weight 
than downward-fluctuating points. Although this 
smoothing technique is quite likely to be success¬ 
ful, we point out that, unlike our ansatz, it relies 
on an assumption of the smoothness of the power 
spectrum. 

One of our main objectives with this paper is 
to provide observers with a method for present¬ 
ing their results that will allow efficient combina¬ 
tion with the results of others in order to create 
a joint determination of cosmological parameters. 
The method is fully described in Appendix B. In 
this appendix we also discuss complications due to 
overlapping sky coverage and upper limits. 
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Fig. 10.- Power spectra that minimize the y 2 in 
Eq. 39 under different editings of the data. Top 
panel: no TOCO. Middle panel: no MSAM, bot¬ 
tom panel: no Saskatoon. Solid curve is standard 
CDM. 
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Fig. 11. Binned power spectrum that minimizes 
the x 2 in Eq. 39 and which is tabulated in Table 1. 
In addition to the data presented in Table 2 these 
results also include T0C098(Miller et al. 1999). 
Solid curve is standard CDM, dashed curve is an 
open CDM model with matter density one third of 
critical, baryon mass density of 0.035 times critical 
and a Hubble constant of 60 km/sec/Mpc. The 
dotted curve is a prediction for local cosmic strings 
(Allen et al. 1997). Curves at high l indicate upper 
limits derived from OVRO (left) and SuZIE (right) 
data. 
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A. Signal-to-Noise Eigenmodes and the general form of the likelihood 

We start with the full-sky likelihood, either in the form of Eq. 5 or in terms of Z = ln(C + x), Eq. 17. 
We transform this form to signal-to-noise eigenmodes (e.g., Bond (1994);Bunn & White (1997);Bond, Jaffe 
& Knox (1998)). Here, the data in the signal-to-noise eigenmode basis are <4, with diagonal covariance, 

( dkdk') = (1 + cr 2 h A k)$kk' ■ (Al) 


We allow the amplitude, cr 2 h , for the signal contribution to vary, since the eigenmodes only depends upon the 
shape of the signal covariance, itself dependent on the input power spectrum. If we define the signal-to-noise 
transformation with power spectrum C^ hape , then Eq. Al is valid for all Cg = cr 2 1 :i C| hape . The eigenvalue for 
mode k is cr 2 h Afc, with units of (signal-to-noise) 2 . 

Because the Gaussian variables, <4, are statistically independent, the likelihood is made up of independent 
contributions, 


In C/C 


IV f ^ ( g t 2 h - g th) A fc 

2 fc U + 1 + 


+ In 


1 + 

1 + ^th^fc. 


(A2) 


where a “hat” refers to the quantity at the likelihood maximum. 

Introducing the number of modes with a given value of A, 5 (A), and the cumulative number of modes 
G(A) = this can be written in the suggestive form 


In C/C 


— 2 / dG W 


xl 


1 + 


Z(\)-Z(\) 
(e-[zW-zW] _ ^ 


(A3) 


Here X\ = J2\ k =\ d k/9W is the \ 2 per degree of freedom for modes of the A. On average, y 2 approaches 
(1 + ir 2 h A), in which case the form is a sum of terms like Eq. 16. The variables Z{ A) = ln(l + cr 2 h A) are 
analogous to the form we have been using if A -1 is interpreted as a special case of the offset x. The integral 
is to be interpreted in the Stieltjes sense, as a sum over the discrete A spectrum, YlxdWi- ■ •)• 

Consider what happens asymptotically with increasing tr 2 h - The modes with tr 2 h A > 1 contribute, so 
GVth 2 ) hi cr t 2 h is the leading behavior. As er 2 h goes up, more eigenmodes may contribute, In C/C decreases 
faster, modifying the tail. 


B. Data Reporting Recipe 

We recommend reporting future (and, if possible, past) CMB results in a form that will render them 
amenable to this “radically-compressed” analysis. Thus, experimenters and phenomenologists ought to 
provide estimates of 

• Ct = £(£ + 1)CV/(27t), the power spectrum in appropriate bins; 

• J~7 P }. the curvature or covariance matrix of the power spectrum estimates; and 

• xg, the quantity such that Zg = ln(C^ + xg) is approximately distributed as a Gaussian. 

We here provide an outline of the steps needed to provide the appropriate information. Current listings of 
publicly-available results will be posted at http://www. cita.utoronto. ca/^knox/radical .html. Please 
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contact the authors to have results included. 

1) Divide the power spectrum into discrete bins. To prevent significant loss of shape information, the bins 
should not be too large. However, there may be a problem with making the bins too small. The closer we 
are to the case of well-determined, independent bins, the better our ansatz is expected to work. Thus bins 
should be large enough to keep relative error bars smaller than 100% and bin to bin correlations small. The 
shape-dependence resulting from coarse bins can be reduced by the use of a filter function for each band 
(see Knox (1999)), as was done with the MSAM (Medium Scale Anisotropy Measurement) three-year data 
(Wilson et al. 1999). 

2) Find the power in each bin that maximizes the likelihood and evaluate the curvature matrix at this point. 
This can be done using your favorite likelihood search algorithm. For COBE/DMR and Saskatoon we have 
used the iterative scheme described in Bond, Jaffe & Knox (1998). Our current implementation does not 
include a transformation to S/N-eigenmode space. In a forthcoming paper (Knox & Jaffe 1998), we will 
provide detailed information on the implementation of this quadratic estimator and an appropriate sample 
set of programs. 

3) Estimate x for each of the bins. If the likelihood is calculated explicitly, this can simply be done by 
numerical fitting to the functional form of our ansatz, Eqs. 8 and following, or Eq. 34. If the likelihood peak 
is determined by the iterative quadratic scheme or some other method which also calculates the curvature 
matrix (or, less-preferably, Fisher matrix), the appropriate formulae from Sec. 2.2 (for total-power mapping 
experiments) or Sec. 4.1 (Eq. 28 for chopping experiments). 

4) Do not alter the curvature matrix by folding in the calibration uncertainty in any way. Report the 
calibration uncertainty separately. 

B.l. Special Cases 

1) Overlapping sky coverage. Power spectrum constraints will be correlated if they are from datasets with 
overlapping sky coverage and sensitivity to similar angular scales. We have no general theory of these 
correlations. Proper combination of overlapping datasets appears to require a joint likelihood analysis to 
produce their combined constraints on the power spectrum. 

2) Upper Limits. For datasets that can only provide an upper limit to the power spectrum amplitude, a 
simple option would be to calculate the full likelihood directly and simply fit to one of the two forms for 
C? > 0 but with a negative or very small Ct (and such that Ci + Xi > 0). This is what we have done for 
Fig. 6 which demonstrates that both of our approximate forms work fairly well, especially the full form of 
Section 2.2.2. 

Although the results in Fig. 6 look quite impressive, they say nothing about how well the window function 
tells us which regions of the power spectrum are being constrained by the data. In other words, does the 
trace of the window function make a good filter function? Therefore, we present the following alternative 
method for reporting upper limits which includes a prescription for creation of a filter function. 

The data can be reported as amplitudes of signal-to-noise eigenmodes and their eigenvalues (see Ap¬ 
pendix A). One need only report the modes with the largest eigenvalues. The number of modes that it is 
necessary to report is likely to be quite small. The likelihood of of h , where C( = of h C| hape , is then: 

X 2 s= -2 ln£ = ln (! + °ti A) + x ( B1 ) 

i L 1 + <T th A *J 

where is the amplitude of the ?'th mode, A, is its eigenvalue, and C| hape is the power spectrum used to 
define the S/N-modes. Of course, we want the likelihood to be a function of, e.g., a binned power spectrum. 
It is, via: 

4 ( b2 ) 

B 
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where we have assumed a flat power spectrum (C| hapc = C = const), Jb is related to the window function 
as in Eq. 44 or, better, derived from the Fisher matrix as described in Bond, Jaffe & Knox (1998). It is 
straightforward to calculate the derivatives of this y 2 with respect to Cb in order to combine upper limits 
with detections and perform the search procedure described in Section 8. 


C. Band Powers 

The numbers in Table 2 were used to form part of the weight matrix in Eq. 39: W t j = 1 / crfS-ij where a- t is 
from the “standard error” column of the Table. These standard errors are derived from published likelihood 
maxima ( Di ), 68% confidence upper limits (£)“) and lower limits ( D\). Since the upward and downward 
excursions from the mode to the upper and lower limits are usually different, there is some freedom in 
assigning a single standard error. We define Ui as an average of these excursions: 

= [( D u - A) + (A - A')] /2. (Cl) 

If the published number is linear instead of quadratic, then Di = ST?, etc. and the above equation still 
applies. We have also tried producing a, from averaging the inverse square of the upward and downward 
deviations, and found no significant difference in the results (power in bands changes by less than 10% of 
the error bar). 

We also found not much difference in the results depending on how we treated calibration uncertainty. 
Most experiments report their upper and lower limits with calibration uncertainty included. Only for Saska¬ 
toon, MSAM, QMAP, T0C097 and T0C098 have we included calibration uncertainty by treating it as an 
independent parameter (u a in Eq. 39). 

Missing from the table are detections from the White Dish (Tucker et al. 1993) experiment. The White 
Dish dataset was compressed to two band-power detections with sensitivity in the range £ ~ 300 to ~ 600. 
A recent reanalysis (Ratra et al. 1997), results in upper limits which are sufficiently loose that including 
them would make no difference in our power spectrum determination. Both these analyses use only a small 
subset of the available data; a complete analysis will probably provide detections (Griffin et al. 1998). 
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Table 2 

Input band powers, standard errors, and noise contributions to the variance (x) in /j, K 2 . 


dataset t--t+ BP err x reference 


firs 

4-25 

927.8 

440.7 

?? 

tenerife595 

13-28 

1164 

705.9 

?? 

BAM 

31 -90 

870.3 

478.5 

?? 

SP91-6225-63 

35-98 

892.2 

382.8 

?? 

SP94-62-4ch 

33-95 

837.5 

384.6 

?? 

SP94-62-3ch 

40-114 

1632 

584.5 

?? 

pyth96I-II-III 

52-99 

2916 

1351 

?? 

pyth96III 

132-237 

3364 

1565 

?? 

qmap_q 

79-143 

2704 

520.0 

11 

qmap_kal 

60-101 

2209 

604.5 

11 

qmap_ka2 

99-153 

3481 

760.5 

11 

toco97_3 

45-81 

1600 

720 

500 

toco97_4 

70-108 

2040 

600 

100 

toco97_5 

90-138 

4900 

850 

0 

toco97_6 

135-180 

7850 

1300 

0 

toco97_7 

170-237 

7170 

1300 

3000 

sp89 

87-247 

0.0 

1459 

1830 

argo 

69-144 

1060 

613.0 

11 

MAX4av 

89-249 

2586 

876.9 

11 

MAX5av 

89-249 

1511 

573.8 

11 

ovro22 

362-759 

3127 

813.1 

11 

catl 

349-473 

2583 

1512 

11 

cat2 

559-709 

2401 

1584 

11 

catl-98 

349-473 

3937 

2322 

15700 

cat2-98 

559-709 

0.0 

5031 

15700 

OVRO 

1147 2425 

72.4 

380.3 

367 

SuZIE 

1366-3000 

354.3 

753.4 

122’ 


Ganga et al. (1994); Bond (1994) 

a 

Tucker et al. (1997) 

Gaier et al. (1991); Schuster et al. (1991) 
Gundersen et al. (1995) 

Gundersen et al. (1995) 

Platt et al. (1997); Ruhl et al. (1995) 

Platt et al. (1997); Ruhl et al. (1995) 

b 

b 

b 

Torbet et al. (1999) 

Torbet et al. (1999) 

Torbet et al. (1999) 

Torbet et al. (1999) 

Torbet et al. (1999) 

Meinhold & Lubin (1989) 

Masi et al. (1996); DeBernardis et al. (1994) 
Clapp et al. (1994); Devlin et al. (1994) 

Lim et al. (1996); Lim et al. (1996) 

Leitch (1998) 

Scott et al. (1996) 

Scott et al. (1996) 

Baker et al. (1998) 

Baker et al. (1998) 

Myers, Readhead & Lawrence (1993) 

Church et al. (1997) 


a Guitteriez de la Cruz et al. (1995); Hancock et al. (1994); Watson et al. (1992) 

b Devlin et al. (1998); Herbig et al. (1998); de Oliveira-Costa et al. (1998) 

s x = 419 is a better fit for the equal variance form (not used in the calculations of Section 8). 


Note.— These data, their window functions, and also data for MSAM-3yr (Wilson et al. 
1999), Saskatoon (Netterfield et al. 1997) and COBE/DMR (Bennett et al. 1996) are available at 
http://www.cita.utoronto.ca/~knox/radical.html. These last three datasets are not shown in the table 
here because they have multiple bands with covariance matrices. 



